What is Turing?
It’s a Perl-like language where memory management is solely up to the user. It’s barely object-oriented, allowing for classes, inheritance (and abstract classes) and objects - nothing beyond that.
Backstory
A while ago, I started on my Game Engine in Turing - along with a preliminary 2D physics engine. Unbeknownst to me, OpenTuring is an object-oriented programming languages. Wow! However, spoiled by C#, I didn’t understand the concept of “clean up your own trash” and found “free after use” quite irritating and annoying.
Before we begin I’ll assume you have a solid grasp of object-oriented concepts, and a solid understanding of pointers and references. This article aims to demonstrate the furthest limits of the Turing language, with no practical goal in mind.
Memory Allocation
Let’s suppose that we want to create an object. We may do something like this (in C++):
1 |
|
Which is really just (in C):
1 |
|
Note that every malloc
must have a free
or else you’re just allocating resources without freeing.
Thus, to destroy the object, we can:
1 |
|
Which is really just:
1 |
|
Ok, but what happens when you don’t free resources? Well, a memory leak occurs! Consider this code:
1 |
|
When run, the program will continuously allocate memory for Object
without freeing it. This is terrible news as our precious memory will run out the program will crash.
Memory leaks happen when you forget to free objects and the program’s memory usage slowly blows up. Solution? Enter garbage collection (GC)!
How to GC 101
Garbage collection is a form of automatic memory management, where memory and objects no longer used by the program is released. So how do we implement a GC? We need to somehow:
- Keep track of all objects
- Figure out which ones are not in use by the program anymore
Before we continue… Reflection!
In computer science, reflection is the ability for a program to examine and modify it’s own structure and behaviour (preferably during runtime). Foundry from the (probably dead) compsci.ca
forums posted about achieving reflection in Turing. How did he do it? That’s beyond me. Here’s my version of his Invoke.tu
(modified to fit my purpose):
Tracking Objects
This is relatively easy to do. Since Turing provides no easy way to override operators, we can instead create a proxy method called New
that creates a new object. We can then use reflection to create a new object!
So I defined a new procedure called New
which will 2 parameters. The an address of the variable that will store the reference of the object we’ll create and a class type:
Now how the hell do we pass a type as a parameter in Turing? Well, we force Turing to interpret it as a pointer to something. Since Turing allows raw pointers, the object pointed will be in the form of:
We can then pass the baseClass
as a parameter to the NEWCLASS
JIT opcode and the object should be created. Then we manually craft the JIT code and force Turing to execute it!
Checking Object in Use
Notice that we pass a pointer to the reference of object to the New
procedure. That is, if we store the object in the variable obj
, then New
accepts the address of obj
. This is useful as if obj
goes out of scope, so does the object.
Using this, we can track the number of references (in-scope pointers) to the object and when the number of references reaches zero, we may assume the object is no longer being used by the program (it is no longer accessible). However, this brings up the problem of assignment - if ptr1
points to our Object
and we allow ptr1 = ptr2
then leave the scope of ptr1
, our GC does not know that ptr2
still has a reference to the object. This can lead the GC to potentially free an object still in use,
In C++, the solution would be to create a custom pointer type and override the assignment operators to keep track of such assignments. For instance, an assignment of ptr2 = ptr1
would increase the object’s reference count by 1. Likewise, an assignment of NULL
(or similar value) to ptr1
would result in a decrease of the object’s reference count by 1.
Unfortunately in Turing, this is not a viable solution. Instead, when a new object is created, the address of the pointer to that object and the object’s address are stored in a list. When another object is created or destroyed, we loop through that list once and count the references to each object. Then, we loop through a list of allocated objects and objects with references of zero are destroyed.
However, (just like in C++) we would have to create custom pointer types. For instance:
1 |
|
Will not work. Instead, we have to cook up some crazy shit like:
1 |
|
Finally, we must tackle the question of knowing when a pointer falls out of scope. To do this, I’ll briefly introduce the stack frame. When a function is called, it allocates a stack frame at the top of the stack (stack grows down from higher memory addresses) - this stack frame is where most local variables of that function is stored. This will allow us to determine the scope of a variable by comparing its address with the top of the stack. When a function exists, the stack frame is popped and the frame below is restored.
1 |
|
Notice that as we enter bar()
from for()
, the addresses of pointers within bar()
will be lower than the pointers within foo()
. So, when we exit the scope of foo()
all pointers with addresses lower than ptr1
will be out of scope too. Thus, when we allocate a new object and we obtain the address of the current pointer, all current pointers whose address is lower than the current one is out of scope and can be freed. A better solution would be to obtain the current top of the stack and free all pointers that way. But this works so far.
Implementation (so far…)
Here is an proof of concept of the algorithm outlined above. We take advantage of Turing’s flex
(flexible) arrays to store a list of pointers/references.
Here is an example script to test the GC:
And a screenshot of the results:
Oh yeah, you can find the full set of files here.
Next Steps? (Function Parameters)
This is not possible